Cross-chip communication mechanism in distributed node topology

ABSTRACT

A method of communicating between processing units on different integrated circuit chips in a multi-processor computer system by issuing a command from a source processing unit to a destination processing unit, receiving the command at the destination processing unit while the destination processing unit is processing program instructions, and accessing registers in clock-controlled components of the destination processing unit without interrupting processing of the program instructions by the destination processing unit. The access may be a read from status or mode registers of the destination processing unit, or write to control or mode registers. Many processing units can be interconnected in a ring topology, and the access command can be passed from the source processing unit through several other processing units before reaching the destination processing unit. Each of the processing units is assigned a respective, unique identification number (PID) in addition to one or more optional “special” tags which are not necessarily unique, and an external command (XSCOM) interface on a given chip recognizes only those commands that include the corresponding chip tag, unless the command is a broadcast. Commands may be directed to subgroups of processors by implementing masks against the PID, selected portion of the PID, or other “special” tag in a broadcast fashion. The XSCOM interface also has the ability to block any broadcast command (e.g., reset) to itself when that command was issued by its associated processing unit (a “Block Self” mode). The processing units are interconnected via a fabric bus, and the XSCOM interface preferably uses an additional communications line that follows the topology of the fabric bus or could alternately use command/data packets across the existing fabric transmission protocol. The service processor has access to this command interface through an external port (e.g. JTAG) and assembly code running on the processing unit has access to the command interface via special assembly code sequences.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention generally relates to computer systems, andmore particularly to an improved method of handling communicationsbetween computer components such as processing units of amulti-processor system which are interconnected in a distributedtopology.

[0003] 2. Description of the Related Art

[0004] The basic structure of a conventional symmetric multi-processorcomputer system 10 is shown in FIG. 1. Computer system 10 has one ormore processing units arranged in one or more processor groups; in thedepicted system, there are four processing units 12 a, 12 b, 12 c and 12d in processor group 14. The processing units communicate with othercomponents of system 10 via a system or fabric bus 16. Fabric bus 16 isconnected to a system memory 20, and various peripheral devices 22.Service processors 18 a, 18 b are connected to processing units 12 via aJTAG interface or other external service port. A processor bridge 24 canoptionally be used to interconnect additional processor groups. System10 may also include firmware (not shown) which stores the system's basicinput/output logic, and seeks out and loads an operating system from oneof the peripherals whenever the computer system is first turned on(booted).

[0005] System memory 20 (random access memory or RAM) stores programinstructions and operand data used by the processing units, in avolatile (temporary) state. Peripherals 22 may be connected to fabricbus 16 via, e.g., a peripheral component interconnect (PCI) local bususing a PCI host bridge. A PCI bridge provides a low latency paththrough which processing units 12 a, 12 b, 12 c and 12 d may access PCIdevices mapped anywhere within bus memory or I/O address spaces. PCIhost bridge 22 also provides a high bandwidth path to allow the PCIdevices to access RAM 20. Such PCI devices may include a networkadapter, a small computer system interface (SCSI) adapter providinginterconnection to a permanent storage device (i.e., a hard disk), andan expansion bus bridge such as an industry standard architecture (ISA)expansion bus for connection to input/output (I/O) devices including akeyboard, a graphics adapter connected to a display device, and agraphical pointing device (mouse) for use with the display device.

[0006] In a symmetric multi-processor (SMP) computer, all of theprocessing units 12 a, 12 b, 12 c and 12 d are generally identical, thatis, they all use a common set or subset of instructions and protocols tooperate, and generally have the same architecture. As shown withprocessing unit 12 a, each processing unit may include one or moreprocessor cores 26 a, 26 b which carry out program instructions in orderto operate the computer. An exemplary processor core includes thePowerPC™ processor marketed by International Business Machines Corp.which comprises a single integrated circuit superscalar microprocessorhaving various execution units, registers, buffers, memories, and otherfunctional units, which are all formed by integrated circuitry. Theprocessor cores may operate according to reduced instruction setcomputing (RISC) techniques, and may employ both pipelining andout-of-order execution of instructions to further improve theperformance of the superscalar architecture.

[0007] Each processor core 26 a, 26 b includes an on-board (L1) cache(actually, separate instruction cache and data caches) implemented usinghigh speed memory devices. Caches are commonly used to temporarily storevalues that might be repeatedly accessed by a processor, in order tospeed up processing by avoiding the longer step of loading the valuesfrom system memory 20. A processing unit can include another cache,i.e., a second level (L2) cache 28 which, along with a memory controller30, supports both of the L1 caches that are respectively part of cores26 a and 26 b. Additional cache levels may be provided, such as an L3cache 32 which is accessible via fabric bus 16. Each cache level, fromhighest (L1) to lowest (L3) can successively store more information, butat a longer access penalty. For example, the on-board L1 caches in theprocessor cores might have a storage capacity of 128 kilobytes ofmemory, L2 cache 28 might have a storage capacity of 512 kilobytes, andL3 cache 32 might have a storage capacity of 2 megabytes. To facilitaterepair/replacement of defective processing unit components, eachprocessing unit 12 a, 12 b, 12 c, 12 d may be constructed in the form ofa replaceable circuit board or similar field replaceable unit (FRU),which can be easily swapped installed in or swapped out of system 10 ina modular fashion.

[0008] As multi-processor, or multi-chip, computer systems increase insize and complexity, an excess amount of time can be consumed by theoverall system in performing various supervisory operations, e.g.,initializing each chip at boot time (IPL) or for some other systemreset. Most of the supervisory commands that are issued from the serviceprocessor to each chip are the same, introducing a degree of redundancyin the procedures that causes a small problem in small systems, butscales to a bigger problem as the system gets bigger. An exemplarystate-of-the-art multi-processor system might have four drawers ofprocessing units, with two multi-chip modules (MCMs) in each drawer, andfour processing units in each MCM, for a total of 32 processing units.This construction leads to a long boot time as the service processormust sequentially send initialization commands to each of the 32processing units. The problem can additionally arise with other commandsthat might be issued after initialization, such as cumulative statuschecking, or reading fault isolation registers (FIRs).

[0009] This problem applies to supervisory routines running on theservice processor and also any supervisory routines that might berunning on one of the processor cores, since a core cannot directlycontrol other chips in the system without communicating with the serviceprocessor, which creates a communications bottleneck. Moreover, thistype of usage of the service processor represents a somewhat centralizedcontrol structure, and the trend in modern computing is to move awayfrom such centralized control since it presents a single failure pointthat can cause a system-wide shutdown.

[0010] In some prior art multi-processor topologies, data pathways maybe provided directly between processing units to allow sharing ofmemory, but these pathways are inappropriate for handling system-widecommands. The inter-chip data pathways have limited functionality, andare part of the clock-controlled domains of the chips. Accordingly, anyattempted use of these pathways for supervisory commands would interruptoperation of the processing units and adversely affect overall systemperformance.

[0011] In light of the foregoing, it would be desirable to devise acommunications mechanism for a multi-processor computer system whichfacilitates transmission of system-level (e.g., supervisory) commands todifferent chip components such as processor cores and memory subsystems.It would be further advantageous if the mechanism could allow suchcommands to issue and execute while the processing units are running,that is, without interruption.

SUMMARY OF THE INVENTION

[0012] It is therefore one object of the present invention to provide animproved method of communications between chips or processing units in amulti-processor computer system.

[0013] It is another object of the present invention to provide such amethod which facilitates transmission of system-wide orsupervisory-level commands to multiple processing units.

[0014] It is yet another object of the present invention to provide amechanism for cross-chip communications in a distributed node topologywhich does not exclusively rely on a centralized command structure thatmight present a communications bottleneck.

[0015] The foregoing objects are achieved in a method of communicatingbetween processing units in a multi-processor computer system, generallycomprising the steps of issuing a command from a source processing unitto a destination processing unit (wherein the source and destinationprocessing units are physically located on different integrated circuitchips), receiving the command at the destination processing unit whilethe destination processing unit is processing program instructions, andaccessing registers in clock-controlled components of the destinationprocessing unit in response to the command, without interruptingprocessing of the program instructions by the destination processingunit. The access may take the form of reading data from status or moderegisters of the destination processing unit, or writing data to controlor mode registers of the destination processing unit. In theillustrative embodiment, there are many processing units interconnectedin a ring topology, and the access command can be passed from the sourceprocessing unit through several other processing units before reachingthe destination processing unit. Each of the processing units isassigned a respective, unique identification number (PID) in addition toone or more optional “special” tags which are not necessarily unique,and the external command interface on a given chip recognizes only thosecommands that include the corresponding chip tag, unless the command isa broadcast command. Additionally, there is an ability to direct thecommand to one or more subgroups of processors by implementing subsetmasks against the PID, selected portion of the PID, or other “special”tag in a broadcast fashion. The external command interface also has theability to block any broadcast command (e.g., reset) to itself when thatcommand was issued by its associated processing unit (a “Block Self”mode). The processing units are interconnected via a fabric bus, and theexternal command interface preferably uses an additional communicationsline that follows the topology of the fabric bus or could alternatelyuse command/data packets across the existing fabric transmissionprotocol. The service processor has access to this command interfacethrough an external port (e.g. JTAG) and assembly code running on theprocessing unit has access to the command interface via special assemblycode sequences.

[0016] The above as well as additional objectives, features, andadvantages of the present invention will become apparent in thefollowing detailed written description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] The present invention may be better understood, and its numerousobjects, features, and advantages made apparent to those skilled in theart by referencing the accompanying drawings.

[0018]FIG. 1 is a block diagram depicting a conventional symmetricmulti-processor (SMP) computer system, with internal details shown forone of the four generally identical processing units;

[0019]FIG. 2 is a block diagram illustrating one embodiment of aprocessing unit or chip for a computer system, constructed in accordancewith the present invention, and having an external scan communications(XSCOM) interface allowing chip-to-chip communications;

[0020]FIG. 3 is a block diagram of a multi-chip module (MCM) utilizingfour of the processing units of FIG. 3 which are interconnected inaccordance with one implementation of the present invention;

[0021]FIG. 4 is a block diagram of a processor group comprising threedrawers which each contain two of the MCMs of FIG. 3 and areinterconnected in accordance with one implementation of the presentinvention; and

[0022]FIG. 5 is a representation of an XSCOM command format inaccordance with one implementation of the present invention.

[0023] The use of the same reference symbols in different drawingsindicates similar or identical items.

DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

[0024] With reference now to the figures, and in particular withreference to FIG. 2, there is depicted one embodiment 40 of a processingunit constructed in accordance with the present invention. Processingunit 40 is preferably constructed as a single integrated-circuit chip,and is generally comprised of two processor cores 42 a and 42 b, amemory subsystem 44, a scan communication (SCOM) controller 46, anexternal SCOM (XSCOM) interface 48, and a JTAG interface 50 connected toa service processor 51. Processor cores 42 a, 42 b and memory subsystem44 are clock-controlled components, while SCOM controller 46, XSCOMinterface 48 and a JTAG, interface 50 are free-running components.Although two processor cores are shown as included on one integratedchip, there could be fewer or more.

[0025] Each processor core 42 a, 42 b has its own control logic 52 a, 52b, separate sets of execution units 54 a, 54 b and registers/buffers 56a, 56 b, respective first level (L1) caches 58 a, 58 b, and load/storeunits (LSUs) 60 a, 60 b. Execution units 52 a, 52 b include variousarithmetic units such as fixed-point units and floating-point units, aswell as instruction fetch units and instruction sequencer units.Registers 56 a, 56 b include general-purpose registers, special-purposeregisters, and rename buffers. L1 caches 58 a, 58 b (which arepreferably comprised of separate instruction and data caches in eachcore) and load/store units 60 a, 60 b communicate with memory subsystem44 to read/write data from/to the memory hierarchy. Memory subsystem 44may include a second level (L2) cache and a memory controller.

[0026] SCOM controller 46 is connected to various “satellites” locatedin the clock-controlled components. In the embodiment depicted in FIG.2, there are three SCOM satellites 62 a, 62 b, and 62 c. SCOM satellites62 a and 62 b are respectively located in the control logic 52 a, 52 bof cores 42 a, 42 b, while SCOM satellite 62 c is located in memorysubsystem 44. Only three SCOM satellites are illustrated for simplicity,but those skilled in the art will appreciate that there could be manymore satellites located throughout processing unit 40.

[0027] SCOM controller 46 allows the service processor to access theSCOM satellites while the components are still running, via JTAGinterface 50. The satellites on a given chip are connected in a ringfashion with SCOM controller 46. These SCOM satellites have internalcontrol and error registers (along with mode, status, et al. registers)which can be used to enable and check various functions in thecomponents. Any subset of the registers in any component on the chip maybe SCOM-enabled. The chip designer can select whatever configurationmight be desirable for the particular application, e.g., faultindicators for a diagnostics routine. In this manner, the serviceprocessor can access any chip in the multi-processing system via JTAGinterface 50 and access registers while the system is running, withoutinterruption, to set modes, pulse controls, initiate interface alignmentprocedures, read status of FIRs, etc. SCOM controller 46 carries outthese functions by setting an internal command register and an internaldata register.

[0028] Assembly code running on a component, particularly in theprocessor cores 42 a, 42 b, can allow the cores to utilize SCOM featuresas well. Thus a core can read status bits of another component andcontrol the logic anywhere on its own chip. Using this assembly code andcontroller 46, a core can further access components on other chips viaXSCOM interface 48 (discussed in more detail below). SCOM controller 46includes appropriate logic to arbitrate between JTAG interface 50 andany assembly code commands from the two processor cores, and the XSCOMinterface 48.

[0029] JTAG interface 50 provides access between the service processorand SCOM controller 46. JTAG interface 50 complies with the Institute ofElectrical and Electronics Engineers (IEEE) standard 1149.1 pertainingto a test access port and boundary-scan architecture. SCOM is an scancommunications extension that is allowed by standard 1149.1.

[0030] Referring now to FIG. 3, there is depicted one embodiment of amulti-chip module (MCM) 70 constructed in accordance with the presentinvention. In this embodiment, MCM 70 has four integrated chips 40 a, 40b, 40 c and 40 d (more or less than four could be provided). Each of thefour chips 40 a, 40 b, 40 c and 40 d is generally identical toprocessing unit 40 of FIG. 2. In particular, each processing unit 40 a,40 b, 40 c, 40 d includes an XSCOM interface 48 which provides external,chip-to-chip communications without requiring the involvement of theservice processor. In this manner, one processor chip (other than theservice processor) can control all of the remaining processors in themulti-processor system, i.e., read or set status, mode or control bitsin the other processing units without interrupting their operation.Alternately the service processor can access the XSCOM facility on asingle processor chip and control all remaining processors in themulti-processor system via a single command. This capability removes thenecessity of the service processor for some functions, e.g., a systemreset. Such system-level commands can now be broadcast by passing themalong to each processing unit 40 in a daisy-chain fashion, rather thanreplicating the command at the service processor and sending itseparately to each processing unit. Additionally, for some commands itremoves the need for the service processor to sequentially communicatewith each processor chip to perform system-level commands bybroadcasting them through a single processor chip.

[0031] XSCOM interface 48 utilizes a command register and a dataregister to carry out the communications (similar to SCOM controller46). A hardware locking mechanism can be provided to prevent more thanone transaction or sequence of related transactions from occurring at atime. Each XSCOM interface is provided with a primary pair ofinterconnection lines, an input (“Previous Chip”) and an output (“NextChip”). These lines are used to interconnect the four processing unitson MCM 70 in a clockwise ring, i.e., the “Next Chip” line on the firstchip is connected to the “Previous Chip” line on the second chip, and soon. Only chip 40 a is allowed to have off-module interconnections. Asecondary pair of interconnection lines for the XSCOM interface may beprovided (e.g., “Vertical chip” inputs and outputs) to facilitateintra-drawer communication, depending on the fabric topology. Thesecondary pair of lines can be selectively enabled.

[0032] While each of the processing units 40 a, 40 b, 40 c, 40 d in MCM70 include the structures shown in FIG. 2, certain processing units orsubsets of the units may be provided with special capabilities asdesired, such as additional ports.

[0033] With further reference to FIG. 4, there is a depicted oneimplementation of a processor group 72 adapted for use with a symmetricmulti-processor (SMP) computer system in accordance with the presentinvention. In this particular implementation, processor group 72 iscomposed of three drawers 74 a, 74 b and 74 c of processing units.Although only three drawers are shown, the processor group could havefewer or additional drawers. The drawers are mechanically designed toslide into an associated frame for physical installation in the SMPsystem. Each of the processing unit drawers includes two multi-chipmodules, for a total of six MCMs 70 a, 70 b, 70 c, 70 d, 70 e and 70 f(again, the construction could include more than two MCMs per drawer,and the processors could be mounted on processor cards or on a backplanedepending on desired application). There are accordingly a total of 24processing units or chips in processor group 72. Processor group 72 isadapted for use in an SMP system which may include other components suchas additional memory hierarchy, a communications fabric and peripherals,as discussed in conjunction with FIG. 1. Each individual chip ispreferably manufactured as a field replaceable unit (FRU) so that, if aparticular chip becomes defective, it can be swapped out for a new,functional unit without necessitating replacement of other parts in themodule or drawer. Alternately, the FRU could be an entire drawer suchthat if any one component goes bad, the entire drawer is more easilyreplaced.

[0034] One of the MCMs can be designated as the primary module, in thiscase MCM 70 a, and the primary chip 40 a of that module is controlleddirectly by a service processor. The MCMs in processor group 72 furtherutilize the XSCOM communications protocol for module-to-modulecommunications, in a manner similar to that described for FIG. 3. The“Next Chip” line on the primary chip 40 a of a given MCM, such as MCM 70a, is connected to the “Previous Chip” line on the primary chip 40 a ofthe next MCM 72. Some MCMs may utilize a “Vertical Chip” interconnectioninstead of Previous/Next in order to complete the loop on the enddrawers. The MCMs are thus also connected in a clockwise ring or hubtopology by the XSCOM interfaces. This topology preferably follows theexisting fabric data/command bus topology in wiring.

[0035] Each processing unit is assigned a unique identification number(PID) to enable targeting of transmitted data and commands. The XSCOMmode register can then use tags to target XSCOM commands for selectedPIDs. Tags can have a portion that represents the topological (physical)location of the processing unit, as well as another portion thatrepresents functional groupings of the processing units. A portion ofthe PID or separate programmable identifier register can be designatedas a “special” tag such that one or more processing units withcommonality can share commands in their own grouping. Routines can thenform groups based on subsets of PIDs or another separate programmableidentifier register. Using these special qualifier tags or group subsetmasks will cause only certain chips to see the command. Thus, commandscould be limited to, e.g., only chips with I/O devices attached, or onlyprimary chips 40 a, etc. This protocol can further be enhanced to enablea “block self” broadcast mode, wherein the XSCOM command is issued toevery processing unit in the system (or group), except for thebroadcasting unit itself. This feature might be particularly useful forresetting other chips without resetting the issuing chip.

[0036] Additional data pathways could be provided between the chips on amodule or in a group. It would be possible to utilize such pathways forsystem-wide commands by sending XSCOM packets on the existingcommunications fabric, but the illustrative implementation utilizes anadditional line that follows the fabric topology.

[0037] In the preferred embodiment, the XSCOM data is simply a 64-bitregister. It is the source for outgoing data during an XSCOM writeaccess, and the destination for incoming data after an XSCOM readaccess. The interpretation of the contents of this register isdetermined by XSCOM status and control bits that are included in theXSCOM command register. An exemplary format for the XSCOM commandregister 80 is shown in FIG. 5 and is also 64 bits. In thisimplementation, the format includes 32 reserved bits xscomc(0:32), whichare comprised of 21 unused (spare) bits xscomc(0:19,23), 3 specialqualifier bits xscomc(20:22), and a 10-bit chip tag xscomc (24:31).Qualifier bit xscomc(20) controls whether to factor the “special tag”into the chip identification procedure for this command; in thisembodiment a special upper portion of the chip tag, i.e., the first twobits of the chip PID on each processing unit, is compared againstxscomc(24:25). If qualifier bit xscomc(20) is set to zero, this featureis ignored, but when set to one any broadcast command will match onlyagainst these top bits xscomc(24:25). Qualifier bit xscomc(21) controlswhether to factor into the chip identification procedure only the moduleportion of the chip PID. Qualifier bit xscomc(22) controls whether tofactor into the chip identification only the drawer portion of the chiptag. The drawer ID is contained in chip tag bits xscomc(27:28), whilethe module ID is contained in chip tag bit xscomc(29) and the ID for thespecific chip on a module is contained in chip tag bits xscomc(30:31). Amasking ability may also be provided to allow commands to be sent tocertain MCMs or subsets of MCMs, or any other arbitrary grouping thatcan be formed by binary compares of subsets of PID or special tagfields.

[0038] The XSCOM command format also includes a 16-bit SCOM addressxscomc(32:47), six control bits xscomc(48:53), and ten status bitsxscomc(54:63). The 16-bit SCOM address is used to target a particularSCOM satellite on the destination chip for receiving the command. Thefirst control bit xscomc(48) identifies whether the command is a readrequest or a write request. Control bits xscomc(49:51) are used inbroadcasts, and the first of these bits just flags generally for abroadcast command. The second broadcast control bit xscomc(50)identifies whether the broadcast command is to be accepted by everysatellite (this bit is effective only when xscomc(49) is active). Thethird broadcast control bit xscomc(51) identifies whether the read datais to be OR'd or AND'd by each satellite (this bit is also effectiveonly when broadcast xscomc(49) is active). The fifth control bitxscomc(52) is utilized to implement the “Block Self” broadcast modewherein a broadcast command is to be executed by each chip except forthe originating chip itself. The last control bit xscomc(53) is unused.

[0039] As mentioned above, the XSCOM interface contains a hardwarelocking mechanism to prevent more than one transaction or sequences ofrelated transactions from occurring at a time, since both of the coreson a chip have access to this facility, as does the service processorvia the JTAG interface. This locking could be handled through a mailboxor software interface but a hardware mechanism is provided forconvenience. The first six status bits xscomc(54:59) can be used as lockbits for this purpose. The first lock bit xscomc(54) identifies a lockplaced by the service processor. The second lock bit xscomc(55) isunused. The last four lock bits xscomc(56:59) identify locks places bydifferent threads operating on the processor cores (i.e., core0/thread0,core0/thread1, core1/thread0, and core1/thread1). A given command unit(core or service processor) can obtain a lock by requesting a write forthe appropriate lock bit, and thereafter reading that lock bit to see ifit has been set. If no other locks were currently set, then therequesting command unit will be able to set its lock bit. After thecommand is completed, the lock bit is cleared by the originating commandunit.

[0040] The last four status bits xscomc(60:63) are used to signalconditions which may require a retry of an XSCOM command. A hardwareerror bit xscomc(60) is set when a hardware error has occurred, such asa timeout or a cyclical redundancy check (CRC) error. An XSCOM collisionbit xscomc(61) signals a protocol error arising from conflictingrequests. An address not accepted bit xscomc(62) indicates that thetarget satellite address was not accepted by any satellite on any of thechips within the selected PID group. A busy/disabled bit xscomc(63) isset when the target satellite is unable to currently handle the XSCOMcommand.

[0041] All bits in the XSCOM command register are set to zero duringpower-on reset.

[0042] Although the invention has been described with reference tospecific embodiments, this description is not meant to be construed in alimiting sense. Various modifications of the disclosed embodiments, aswell as alternative embodiments of the invention, will become apparentto persons skilled in the art upon reference to the description of theinvention. It is therefore contemplated that such modifications can bemade without departing from the spirit or scope of the present inventionas defined in the appended claims.

What is claimed is:
 1. A method of communicating between processingunits in a multi-processor computer system, comprising the steps of:issuing a command from a source processing unit to a destinationprocessing unit, wherein the source and destination processing units arephysically located on different integrated circuit chips; receiving thecommand at the destination processing unit, while the destinationprocessing unit is processing program instructions; and accessingregisters in clock-controlled components of the destination processingunit, in response to said receiving step, without interruptingprocessing of the program instructions by the destination processingunit.
 2. The method of claim 1 wherein the multi-processor computersystem has more than two processing units interconnected in a ringtopology, and the command is passed from the source processing unit tothe destination processing unit by at least one other processing unit.3. The method of claim 2 wherein each of the processing units has arespective, unique identification number (PID), and said receiving stepincludes the step of matching a chip tag embedded in the command to aPID of the destination processing unit.
 4. The method of claim 2 whereinthe command is a broadcast command directed to a plurality of theprocessing units, and further comprising the step of blocking thebroadcast command from being executed by the source processing unit. 5.The method of claim 1 wherein the processing units are interconnectedwith a system memory device and a service processor, and said issuingand receiving steps utilize an additional communications line thatfollows a topology of a fabric bus.
 6. The method of claim 1 whereinsaid accessing step includes the step of reading data from registers ofthe destination processing unit.
 7. The method of claim 1 wherein saidaccessing step includes the step of writing data to registers of thedestination processing unit.
 8. A mechanism for cross-chipcommunications in a multi-processor computer system, comprising: aprocessing unit having a plurality of clock-controlled componentsincluding at least one processor core which processes programinstructions; a plurality of scan registers located in saidclock-controlled components of said processing unit; a commandcontroller connected to said scan registers which executes accesscommands to selectively read from and write to said scan registerswithout interrupting processing of the program instructions by saidprocessor core; and an external command interface, connected to saidcommand controller, which receives and transmits access commands.
 9. Themechanism of claim 8 wherein said external command interface has aninput adapted to receive an access command from a previous processingunit, and an output adapted to transmit the access command from theprevious processing unit to a next processing unit.
 10. The mechanism ofclaim 8 wherein said processing unit has a unique identification number(PID), and said external command interface passes a received accesscommand to said command controller when the PID matches a chip tagembedded in the received access command.
 11. The mechanism of claim 8wherein said processing unit includes a special tag which is notnecessarily unique, and said external command interface passes areceived access command to said command controller when the special tagmatches a chip tag embedded in the received access command.
 12. Themechanism of claim 8 wherein said external command interface blocks abroadcast command from being executed by said processing unit when thebroadcast command originated with said processing unit.
 13. Themechanism of claim 8, further comprising: a fabric communications busfor handling communications with said clock-controlled components; and acommunications line connected to said external command interface. 14.The mechanism of claim 8 wherein said external command interface readsdata from registers of a different processing unit.
 15. The mechanism ofclaim 8 wherein said external command interface writes data to registersof a different processing unit.
 16. A computer system comprising: amemory hierarchy for storing program instructions and operand data; afabric communications bus interconnected with said memory hierarchy; anda plurality of processing units interconnected with said fabriccommunications bus, each said processing unit having a plurality ofclock-controlled components including at least one processor core whichprocesses program instructions, a plurality of scan registers located insaid clock-controlled components, a command controller connected to saidscan registers which executes access commands to selectively read fromand write to said scan registers without interrupting processing of theprogram instructions by said processor core, and an external commandinterface, connected to said command controller, which receives andtransmits access commands.
 17. The computer system of claim 16 whereinsaid external command interface has an input adapted to receive anaccess command from a previous one of said processing units, and anoutput adapted to transmit the access command from said previousprocessing unit to a next one of said processing units.
 18. The computersystem of claim 16 wherein each of said processing units has arespective, unique identification number (PID), and said externalcommand interface passes a received access command to said commandcontroller when the PID matches a chip tag embedded in the receivedaccess command.
 19. The computer system of claim 16 wherein saidexternal command interface blocks a broadcast command from beingexecuted by its processing unit when the broadcast command originatedwith its processing unit.
 20. The computer system of claim 16 whereinsaid external command interface is connected to a communications linewhich follows a topology of said fabric communications bus.
 21. Thecomputer system of claim 16 wherein said external command interfacereads data from registers of a previous one of said processing units.22. The computer system of claim 16 wherein said external commandinterface writes data to registers of a next one of said processingunits.